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Abstract 

The combination of the multiple shooting strategy with the generalized Gauss-Newton al¬ 
gorithm turns out in a recognized method for estimating parameters in ordinary differential 
equations (ODEs) from noisy discrete observations. A key issue for an efficient implementation 
of this method is the accurate integration of the ODE and the evaluation of the derivatives 
involved in the optimization algorithm. In this paper, we study the feasibility of the Local 
Linearization (LL) approach for the simultaneous numerical integration of the ODE and the 
evaluation of such derivatives. This integration approach results in a stable method for the ac¬ 
curate approximation of the derivatives with no more computational cost than the that involved 
in the integration of the ODE. The numerical simulations show that the proposed Multiple 
Shooting-Local Linearization method recovers the true parameters value under different scenar¬ 
ios of noisy data. 

Key words and phrases. Multiple Shooting, Local Linear Approximation, nonlinear equations, 
parameter estimation, chaotic dynamics, generalized Gauss-Newton, line search algorithm 


1 Introduction 

Ordinary differential equations (ODEs) are extensively used for modeling the temporal evolution 
of complex dynamical systems in dissimilar fields such as physics, economy, ecology, biology, chem¬ 
istry and social sciences [1]. Typically, these ODEs contain parameters that are associated to 
phenomenological factors that control the basic variables interplay of the models. However, the 
values of such parameters are usually unknown and must be determined in such a way that the 
models reproduce the observed experimental data at best. Despite a time series analysis of ob¬ 
served experimental data can determine useful quantities that characterize the system dynamics 
(e.g., Lyapunov exponents, attractor dimension), identifying the system structure and estimating 
the corresponding parameters would be a matter of greater practical value. Thus, an accurate 
estimation of the non observed states and models’s parameters is not only critical to reproduce and 
describe a given dynamic behavior but also to understand the underlying causes of the analyzed 
processes. This is of particular importance for ODEs describing chaotic dynamics, where the tra¬ 
jectories of interest are very sensitive to small perturbations of the parameters and initial values 
m, la, a, a). In this circumstance, a major challenge is to find a proper numerical integrator 
able to preserve the stability of the solutions in situations of parameter-dependent instabilities in 
such a way that allows an accurate estimation of these parameters from noisy chaotic observations. 

‘Biospective Inc., Montreal, Canada 

^Montreal Neurological Institute, Canada 

Tnstituto de Cibernetica, Matematica y Fisica, La Habana, Cuba 


1 



Several strategies have been proposed for dealing with the parameter estimation problem in 
ODEs given a set of noisy observations. Among them, the so-called Initial Value approach is 
perhaps the most known. In this approach, the estimated parameters are those that minimize the 
least square errors resulting from fitting the numerical solution of the corresponding Initial Value 
problem to the given observation data. However, as it has been pointed out in m, m, 0, the 
estimators resulting from this approach are very sensitive to the initial guess of the parameters and 
usually turn out only local optimum solutions. A class of estimation methods that overcome this 
drawback was originally introduced in |6] and it is currently known as the Boundary Value approach 
(see, e.g., 0 , 0 , 0 , m)- This approach has two distinctive components: 1) the introduction of 
several multiple shooting nodes for solving the ODE as multiple Initial Value Problems (IVPs) in 
smaller subintervals, and 2) the solution of a constrained least squares problem in an augmented set 
of parameters. The main advantage of this multiple shooting strategy is that the whole observed 
data can be easily used to bring information about the true solution of the ODE [7]. Thus, the 
solution of the multiple IVP remains close to the true solution since the initial iteration of the 
optimization algorithm. In this way, the influence of the poor initial parameter estimates is 
considerably reduced. Besides, the splitting of the integration interval into multiple subintervals 
limits the error propagation and allows parameter estimation even for chaotic systems m, 0). 
Despite the introduction of additional variables seems to yield a more complicated estimation 
procedure, it is actually increasing computational efficiency and numerical stability of the estimation 
method [7|,0. A third estimation strategy, called nonparametric, employs nonparametric functions 
to represent the unknown solutions of the ODEs (see,e.g., HB, m, m. 0, m)- Typically, this 
class of estimators require two levels of optimization. The lower level approximates non parametric 
functions to the ODE trajectories conditional on the ODE parameters, while the upper optimization 
level does the estimation of the parameters of interest. Clearly, as compared to the previous 
two approaches, this procedure increases the computational burden of the parameters estimation 
process. 

As remarked in [8], a common difficulty of all these estimation strategies is the numerical 
computation of the derivatives of the trajectories with respect to the parameters of the ODE. With 
this respect, three main approximations have been commonly employed. The simplest one, finite 
differences, also called external differentiation 0, m is not usually recommended due to the high 
computational cost required for achieving numerically stable derivatives (see further discussion 
in Hoj). The second one, called internal differentiation, consists on differentiating the numerical 
integrator corresponding to the original differential equation [Sj, 0, US]- In general, internal 
differentiation is a mechanism less computationally intensive than the external differentiation but, it 
might introduce also high computational cost in the case of implicit integrators or integrators defined 
trough some numerical derivatives. The third approach (0, 0, [To]) consists on approximating 
the variational equations that describe the temporal evolution of the required derivatives, which 
must be integrated simultaneously to original equation. As in the second kind of approximation, 
this can be also computationally intensive for certain types of numerical schemes. 

In this paper, we study the feasibility of the Local Linearization (LL) technique (see, e.g., |T6] . 
m) for the simultaneous numerical integration of the IVPs and the evaluation of the numerical 
derivatives that appear in the multiple shooting method. In previous works m, [El, m this LL 
technique has been successfully applied for the parameter estimation of ODEs in the context of 
the Initial-Value approach. This has been possible thanks to the convenient trade-off between the 
numerical accuracy, stability and computational cost of the LL integrators and their capability of 
preserve a number of dynamical behaviors of the ODEs, which became relevant for the parameter 
estimation. In addition to this and following the ideas used in |21] for the computation of the 
Lyapunov Exponents, the LL technique can be used for the numerical integration of the variational 
equations associated to the derivative with respect to the parameters and initial conditions with 
no more computational cost than the that involved in the integration of the ODE. Therefore, the 
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application of the LL technique for identification of ODEs in the framework of Boundary Value 
approach is also attractive. 

The paper is organized as follows. In Sectionthe essentials on the Multiple Shooting strategy 
and the generalized Gauss-Newton algorithm are presented. Section is focused in the link of 
the LL technique to the multiple shooting method. The resulting algorithm for the parameter 
estimation is summarized in this section as well. The performance of the Multiple Shooting-Local 
Linearization method is presented in Section [^throughout three numerical examples. Finally, some 
discussion and conclusions are presented in the last two sections. 

2 Multiple Shooting Method 

Let us consider the d-dimensional ODE 

x = t € [to,T] (1) 

depending on a p-dimensional vector p of parameters, where f : M X X —>■ is a smooth 

function. 

A typical estimation problem for ODEs consist of finding optimal values for the parameters p 
based on the observation of some values of the state variable x contaminated with noise (i.e., data 
points). That is, suppose that a number of N observed data points Zj related to the state variables 
X and parameters p via the observation equation 

Zi = g{t*,x{t*),p) + ei, ( 2 ) 

are given at the time instants t* G [tQ,T], i = 1, where g : M X X RP —>■ M'" is a smooth 

function, and denotes the measurement errors. If the measurement errors are assumed indepen¬ 
dent, Gaussian distributed with zero mean and known variance <7^, then the minimization of the 
weighted least-squares objective function 

N V 

J{P) = -S^{t*,x{t*,p),p)f 

i=l j=l 

with respect to p yields a maximum likelihood estimator for the parameters of the ODE Q. 

2.1 Nonlinear optimization problem 

Formally, the least squares problem described so far is a unconstrained optimization problem of 
the type 

mm{||Fi(p)||2}, 

where Fi(p) = uec(M(p)) is a Vu-dimensional vector, M(p) is a N xv matrix with entries lV[-^*(p) 
= a~^(zj — g^(t*,x(t*), p)) for all i = 1, ..., N and j = 1,..., v, and vec{.) denotes the vectorization 
operator. 

However, in many applications, certain initial/boundary problems as those that appear in con¬ 
trol engineering problems (see, e.g., |22j ) additional requirements for the solutions and parameters 
must be satisfied. Mathematically, these restrictions are represented by a vector of (component 
wise) equality and/or inequality conditions of the form 

R'(^i>x(ti), ...,t*j^,x{t%f),p) = 0 or >0. 

In this situation, our estimation problem is reformulated as a constrained optimization problem of 
the form 

min{||Fi(p )||2 | R 2 (p) = 0; R3(p) > 0}, (3) 

P 
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for certain functions R 2 and R 3 . 

The multiple shooting approach for solving the optimization problem consists on the intro¬ 
duction of m -|- 1 grid points to = tq < ... < Tm = T on the interval [to,T] and new parameters 
Sfc = x(rfc), k = 0, ...,m such that the solution of the original equation 0 can be approximated by 
the solution of a set of independent initial value problems 

X = f(t,x,p); t e [rfc,rfc+i] (4) 

x(Tfc) = Sfc. 

which, in principle, generate a discontinuous trajectory {x(t; r*,, s^, p), t G [r^, Tk+i), fc = 0,..., m — 
1}. These introduced shooting values act as new parameters for the associated optimization 
problem (j^ that should be solved for the augmented parameters q= (p, Sq, ..., Sm). Thus, the 
optimization problem ^ is rewritten as 

min{||Fi(q )||2 | F 2 (q) = 0; R3(q) > 0}, (5) 

q 

where the vector-valued function F 2 contains the equality restrictions R 2 and the continuity con¬ 
ditions 


Cfc = x(rfc+i; Tfc, Sfc, p) - Sfc+i = 0, fc = 0 ,..., m - 1 . ( 6 ) 

Notice that, the purpose of the imposed continuity conditions Q is to guarantee the continuity 
of the final approximated solution of the original equation 0 rather than updating the shooting 
values Sfc from interval to interval in Q. In fact, the initial value problems Q can be independently 
solved following a proper parallel running implementation. 


2.2 Linearized optimization problem 

Clearly, ([^ represents a very large constrained non-linear optimization problem that need to be 
solved via iterative methods. As originally proposed in | 6 ], the damped generalized Gauss-Newton 
method is a suitable choice. Thus, starting with initial guess ...,Sm^), the Gauss- 

Newton iteration is given by 

qi'+l) =qW+a^Aq;, / = 0,1,..., (7) 


where 0 < a; < 1 is a local damping parameter, the increment Aq/ is the solution of the linearized 
problem 


mm ' 
Aq; 


Fi(q^'^) + ^(q^'^)Aq/ 


I F2(q^'^) ^(q('))Aq; = 0; R3(q^'^) + ^^(q^'^)Aq; > 0 I , 

2 I 


dR.s 


and the iteration stops when the absolute error condition 

qd+i) _ q(0 < £ 


holds for certain given tolerance e > 0 . 

As pointed out in [ 6 ] , it is convenient to choose some of the observation time points t* as member 
of the set of multiple shooting grid points tq < ... < Tm- Thus, the choice of the initial parameters 
..., Sm^) can be based on the prior information given by the observation data points, which is a 
recognized advantage of the multiple shooting approach. In that way, despite being discontinuous, 
the initial trajectory {x(t; r^, t G [Tk,Tk+i), k = 0, ...,m — 1 } can remains relatively close 

to the observed data points. 
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For simplicity in our exposition, from now on we will confine to the equality constrained case. 
However, as pointed out in ([B]), the following results can be straightforwardly extended to the 
inequality constrained case. Thus, the optimal solution of the linearized problem 


min 

Aq; 


Fi(q«) + 


aFi 

5q 


(q('))Aqz 


2 

2 


F 2 (q«) + ^(q«)Aqz 



is given by 


where F 


Fi 

F 2 




OTi 

dFi 

9Fi 

9Fi 

dso 

9s 1 

dSm 

9p 

dR2 

9R2 

9R2 

9R2 

dso 

9s 1 

d^m 

9p 

dCQ 

dsQ 

-Id •• 

0 

9co 

9p 


and 


V 0 


^^m — l _T 1 

dSm-l Sp / 




denotes a generalized inverse of the Jacobian 


9q 


(i.e. 


/ 9F^ 

A ^ 1 


1 9q , 

1 9q 

i 9qi 



( 8 ) 

(9) 


( 10 ) 


( 11 ) 


2.3 Equivalent condensed problem 

A major challenge in the computation of the optimal solution of the linearized problem 


is the 


algebraic manipulation of the Jacobian matrix (10) and, in turn, the computation of the generalized 
inverse ©• Notice that the Jacobian ( |10[ ) is a high dimensional matrix of dimension at least 
Nv + d{m — 1), which makes the direct evaluation of the formula (0 computationally unfeasible 
for a large number of either observed data points or multiple shooting nodes. However, the sparse 
structure in the bottom side in the Jacobian (10) allows a convenient recursive elimination of the 
variables As^, Asi. Following [6], a backward recursion can be implemented as 


(m) 
1 

(m) 
2 

For i 


U 


U, 


II 

II 

(12) 

^P-2 cjm) 

dRa 


• dp ’ ^ 

dSm 



= m,m — 1 ,..., 1: 


U 


1) . 

= 

' + S«C,_1, 

P>(^ 1) 

_ 13(0 

' + s® 

/ dci_i \ 

Q(i— 1 ) 

dFi 

+ s 




V dp J 


dsi_i 

1) . 

— 

— *-^2 

' + S?c,_i, 


_ 13 (b 

' + s® 

f dcj_i\ 

q(*— 1) 

._ dPl2 

+ S. 


^2 

• ^2 

V J 

? ^2 

dsi-i 




which transforms the problem (1^ into the equivalent condensed problem 


dsi-i 


dsi-i 


min 

Aso,Ap 


+ sf Aso + P)^^Ap 


( 0 ), 


+ sl°Aso + P^'^^Ap = 0 


(o)a„ - 


(13) 


in the variables Asq and Ap. Notice that, as compared to ([^, the condensed problem (13) is of 
lower dimension due to the elimination of the variables As^,.... As 
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Depending on the nature of the original optimization problem ^ , the solution of the condensed 
problem can be simplihed in several ways. The simplest situation is the one where no equality 
constrains are required. In this case, the solution of the condensed problem can be found by 
solving the system of normal equations (X'X)/3 = X'y with X = ^ ^ i y = — 

Aso 


and 


(3 = 


Ap 


Another simple situation is where there are no equality constrains other than an 


initial condition x(to) = xq for the equation Q. In this case, it can be seen that sq = xq, Asq = 0 
and the condensed problem is solved with X = and (3 =Ap. For a more general case of 
equality constrains, the condensed problem can be solved by using algorithms specifically designed 
for linear least squares problems with linear constrains (see [23], |2Tj, |25| for instance). Once the 
condensed problem has been solved for Asq and Ap, the remaining variables As^,..., Asi can be 
obtained by the forward recursion 


Asj+i = ( ^ ) Asj + ( ^ ) Ap + Cj, z = 0,..., m - 1. 


\dsi 


(S) 


(14) 


2.4 Damping parameter estimation 

It is well-known that the Gauss-Newton iteration 0 with ai = 1 guarantees local convergence to 
a solution q* of the problem 0- However, in practical applications, it is not possible to choose 
initial parameters guess for guaranteeing iteration convergence to the optimal global solution. 
Thus, in order to extend the global convergence domain, the damping parameter 0 < a; < 1 should 
be chosen to unsure the decreasing of an appropriate level function T(q) (i.e., L(q(^'''^)) < L(q(^^)). 
As pointed out in [6|, this monotonicity test is only feasible when the increment Aq; is a descent 
direction of the level function at q^*^. An appropriate choice of L is then given by the locally defined 
natural level functions m, IS]) 


^Kq) 



for which ^(q^^^) = — Aq; (i.e., Aq; is the steepest descent direction of L/ at q*-^^). 
The damping parameter ai is then determined by 


min{L/(q(') a^Aq;)}, 

oil 


(15) 


which can be solved by any line search algorithm ([27|, |28|, [29|). Notice that a line search algorithm 
is also an iterative procedure that would require additional evaluation of the function F at some 
points q^^^ -|-a^^^Aq;, u = 0,1,.... Correspondingly, an extra computationally burden appears 
during the numerical evaluation of the terms 




Indeed, by using similar arguments to the ones employed for deriving (13), we can easily observe 
that the term —r“ is the optimal solution of the linear squares problem 


mm 


Fi(q«+a;“)AqO + ^(q®)rr 


F2(q« + aS“^Aq;) + ^(q^'^)rr = 0 L 


which can be solved by reducing it to a corresponding condensed problem. 

In order to avoid the intensive evaluations required in full line search algorithms, we have 
employed a modified line search method m, [To]) that naturally adapts to the geometry of the 
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problem. Specifically, the modified line search algorithm consists on finding an upper bound for 
the natural level function Li{q), evaluated at q = + a/Aq^, which is given by (see details in [U] 

and [To]) 

Lz(q(^) + a^Aq/) < (l-ai + afw{q^^\ai)J 


where 'u;(q, a) is a function that characterizes the nonlinearity of the optimization problem (15). 
The importance of w{c[,a) is given by the fact that (see proof in (inj); for an arbitrarily chosen 
T] G (0,2], any ai G (0, a*] satisfies the required descending property 


ToiAq;) < 


(16) 


where a* is given by 


a* = min 



_ V _\ 

w{q^’-\a*) ||Aq;||/ 


Since w is unknown a priori, an estimator is given by 


w{q^'‘\ai) = 2 


F(q(') + aiAqi) - (1 - az)Aq/ 

l|a;Aq/f 


(17) 


(18) 


Then, a predictor-corrector procedure can be constructed from the two previous expressions. That 
is, starting with an estimate t(;(q^^“^), a^-i) from the previous Gauss-Newton iteration I — 1, the 
initial guess is determined according to 


a 


( 0 ) 


= min I 1 


w{q('' i),a,_;^) ||Aq;|| 


If the descending property (16) holds, then we should take ai = as the optimal damping 

parameter. Otherwise, w has to be re-estimated from ( |l8[ ) with ai = and the process has to be 

repeated until the descending property (16) be satisfied (see a detailed algorithm implementation 
in [TU]1. 


3 Multiple Shooting - Local Linearization method 

Since analytical solutions x of the ODE ([^ are generally unknown, the objective function J(p) is 
typically approximated by 

N V 

i=l j=l 

where x(t*,p) denotes a numerical approximation to x(tt). Therefore, numerical approximations 
to functions Fi and F 2 as well as theirs derivatives are needed for evaluating the iteration ([^. In 
this section it is shown how the initial value problems (Q as well as the corresponding variational 
equations respecting to the initial value and the parameters are numerically approximated by the 
so-called Local Linearization approach. 

3.1 Local Linearization integrators 

In addition to the IVP (Q , let us consider the associated variational problems corresponding to the 
initial value 

Q{ 

= — (t,x,p)X*G t G [Tfc,rfc+i] (20) 

= L, 
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for all k = 0,m — 1, where Here, by definition, X^'' = 0^ for t ^ [r/c, Tfc+i]. Consider 

also the variational problem corresponding to the parameters p 


= ^(^,x,p)XP + —(t,x,p), t e [rfc,rfc+i] 


( 21 ) 


XP(ri) = Orf 




where X^ = g. 

Denote by T^{h) = {vk < < Tk+i : n = 0,1, ...,Nk} a time discretization of the subinterval 

[Tk,Tk+i] with = Tfc, = Tfc+i, h’^ = - t’^ < h ior h > 0, and satisfying t* € T^(/i) for 

those observation time points t* such that < t^+i- Since the observation time points t*, 

i = 1, ...,N have a fix location over the interval [to, T], any time discretization T^(/i) containing more 
than 2 observation time points does not likely have equally spaced time points over the interval 
[Tk,Tk+i]- Thus, a numerical integration with a fix step size h is, usually, unfeasible. Instead, an 
adaptive step size strategy is in order. For the remaining of our exposition, it is assumed that the 
time discretization T^(/i) have been constructed under the adaptive step size strategy proposed 
in m for the LL integrators with relative and absolute tolerances RelTol and AbsTol. A slight 
modification to this adaptive strategy for including the fix observation time points ft G T^{h) has 
been implemented here. 

The Local Linear approximation y to the solution x of Q is obtained from the local (piece-wise) 
linearization of the function f respecting to x and t, and the exact computation of the resulting 
linear IVP 


5f 


<9f, 


y = ya,p) + ya,p){y - ya) te 


5x 


dt 


t t 


n+l 


( 22 ) 


y{tn) = yts: 


with y(t^) = y^k = Sfc for all n = 0, ...,iVfc (see, e.g., [IS],[IT]). 

By following the same ideas used in [21] for computing the Lyapunov Exponents, the derivatives 
X^'' and X^ can be approximated by the solution of the variational equations 


_ 'V’Sfc 


Y®'=(t^) = y; 


‘"n 


‘'n+1 


(23) 


and 


Y" = ^(Cyi^,P)Y^ + ^(Cy,k,p), tG 


- VP 


Y^(t^) = 


''n! ^n +1 


(24) 


respectively, with Y^'=(t§) = = 1^ and YP(t^) = Y^^ = O^xp- Notice that, by construction, 

Y®r = Orf for r / fc, n = 0,1,..., N^. 


The solutions y, Y^*" and Y^ of the equations (22), (23) and (24) can be straightforwardly 


derived by using their integral representations obtained in [IS] and [21] combined with the formulas 
for computing integrals of exponential matrices proposed in m- That is. 


y^^, =ya +Ei4(ytO, n = o,...,X a ,-1 


‘'n + l 

Yf,^ = En(yAs)Yf^ n = 0,..., Nj, - 1 


(25) 

(26) 


and 


Yffc =En(ytk)Y^i,+'Ei 2 (ytk), n = 0,...,Nk-l 


n + 1 


(27) 
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where the vectors Ei 4 (yjfc), Ei 2 (y 4 fc) and the matrix Eii(yjfe) are specific block components of the 
exponential matrix 


exp(/i^C) = 


Eii(ytfc) Ei2(yife) Ei3(y^fc) Ei4(y*0 


with C gM('^+P+2) ('^+P+2) defined as 


C = 


£(imyi^p) §,itn,yt!s,p) §itn,yt!^,p) f(in>yt^p) 


9f r+k 


df (+k 


ep 


0 

0 

0 


0 

0 

0 


It is worth noticing here that the numerical implementation of LL schemes (25), (26), (27) reduce to 
the use of a convenient algorithm for computing matrix exponentials, e.g., those based on rational 
Fade approximations [32], the Schur decomposition |32| or Krylov subspace methods [33]. The 


selection of one of them will mainly depend on the size and structure of the matrix C. For instance, 
for many low dimensional system of equations one could use the algorithm developed in |34j , which 
takes advantage of the special structure of the matrix C. Whereas, for large systems of equations, 
the Krylov subspace methods are strongly recommended. 


Notice also that the equations (22), (23) and (24) are not the result of applying the standard 


local linearization technique simultaneously to the set of equations (Q, (20) and (21). Instead, 
an appropriate local linearization approach has been chosen in order to decouple the system of 
equations Q, (20) and (21). Indeed, (22) is the local linear approximation to equation (Q but 
equations ( |23| ) and (24) are suitable linear equations with locally constant coefficients. Nevertheless, 
it has been proved in [2T] that 


sup ||X"(t)-Y‘ 


< 


where the constant does not depend on h. Correspondingly, following similar arguments to the 
ones employed in Theorem 4 of [H], it can be also proved that 

sup \\lU>{t)-YP{t)\\<C^^h, 

\Tk fc + l] 


for certain constant C^. Thus, despite 


k 1,2 


sup ||x(t) — ytll < C^h 


for certain constant (see proof in |16)). the system of equations (22)-(24) has global order of 
convergence equal to 1. In other words, the numerical derivatives X® and can be approximated 
with global order of convergence 1 and no extra computationally cost but the one involved in 
the implementation of the local linearization schemes. Remarkably, it has been also avoided the 
manipulation of second order derivatives like to ones that would certainly appear with the employ of 
internal differentiation in the equation (22). Additionally, under request, the Lyapunov exponents of 


the ODEs might be straightforwardly approximated from the solution Y^ by following the algorithm 
developed in 
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3.2 Parameters estimation algorithm 


The Multiple Shooting-Local Linearization algorithm for estimating the unknown parameters p of 
the model 0 -(§ proceeds by inserting the LL approximations of the previous subsection into the 
minimization objective function (19), namely, 


N V 




j=i j=i 


where yt* denotes the LL approximation to x(t*), i = Correspondingly, the continuity 

constrains and additional equality constrains take the form c^. = y^k — k = 0, ...,m — 1 , 

and R 2 = R 2 (ti, yt*, p), respectively. Analogously, the functions Fi(p), F 2 (p) and R 3 

of the Section must be redefined in terms of the approximations y, and Y^ to x, and 
X^. Indeed, from now on, Fi(p) = t;ec(M(p)) with M'^*(p) = — g-^(t*, yt*, p)), 


dFi 

dsk 

dFi 

dp 


5 R 2 

dsk 

5 R 2 

dp 

dck 

dsk 

dck 

dp 


[|^(t^yt*,p)Y,"^;^(f2,yt*,p)Y|;...;^(t^,yt^,p)Yj], 

[|^(t^,yt*,p)Yf. + ^{tl,yt*,py, |^(t^,yt*,p)Yf* + yt*, p); 

^{t*N,yt*^,p)Yy^ + ^{t*N,yt*^,p)], 

^ yiL • • - > ytlV ’ p) 


2=1 

N 


t* 




(28) 


'Y'^k 

^ .k ’ 


YP 


where denotes the algebraic operation of concatenating matrices with equal number of 

columns by their rows. Here, Yp and Y^* denote the LL approximations to X^*’ (t *) and X^(t*),respectively. 
The parameters estimation algorithm is then summarized in the following steps: 


1. Setting I = 0 and initial guess = (p^^^, Sq°\ ..., Sm^) for the parameters and shooting 
nodes, 

2. With p = p(^^ and k = I, ...,m, compute yt*,Y^!^ and Yf* as indicated in Section 


3.1 for all i = 1, Then, evaluate the expressions (28), 


3. Compute the increments Aq^ in ([^ by either direct evaluation of the Jacobian (10) and the 
generalized inverse 0 or evaluating the backward and forward iterations ( |12[ ) and ( |14[ ) in 
the condensed problem. 


4. Compute the damping parameter ai by the modified line search algorithm according to 0 - 

0, 

5. Iterate the Gauss-Newton algorithm qJ+^l = q(b -|- aiAqi, 

6 . Set 1 = 1 + 1 and repeat steps (2)-(5) until — q^^^ll < e for a given tolerance e > 0. 
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3.3 Variance estimation 


In practical situations, the variance cr^ of the observation errors in ([^ is also an unknown parameter 
that should be estimated, namely, by extending the parameter p with the inclusion of a. However, 
since only the function Fi does depend on a, the inclusion of a in the Gauss-Newton iteration 
process would unnecessarily increase the dimension of the problem. An alternative estimation for 
a is then computed as 


N V 
i=lj=l 


Obviously, in this case, the estimated p is not longer maximum likelihood estimator. 

4 Numerical Experiments 

In this section, the performance of the Multiple Shooting-Local Linearization approach is illustrated 
through three numerical examples. The first example, extensively studied in [3] , is a 4-dimensional 
chaotic system defined by a vector field that is linear respecting to the unknown parameters. 
The second example corresponds to the well-known FitzHugh-Nagumo system, which is defined 
nonlinearly respecting to the parameters of interest. The last example correspond to the Rikitake 
system [35], which is known for generating chaotic trajectories fro certain parameters combination. 
For the three examples, the parameters were estimated with a stopping tolerance of e = 10““^ and 
the shooting points were selected within the set of the observed time points t*, i = 1, V, in an 
approximately equispaced manner. For each t*, i = the LL approximations and 

Y^* were adaptively computed with relative and absolute tolerances RelTol = 10“^ and AbsTol = 
10 "®. 

Example 1. Consider the Henon-Heiles system described by the 4-dimensional ODE (see 
details in |3|): 



Xl 

X2 

3^3 

Xi 


X3 

X4 

—axi — 2 xiX 2 
—bx 2 — x\ — cx\, 


with parameters p = (a, 5, c). The ’’true” trajectory in the interval [0,10] is shown in Figure 1 for 
p = (1,1, —1) and initial condition xq = (0,0,0.3, —0.4). This ’’true” trajectory x was generated 
by the Local Linearization method with a fixed step size oi h = 2“^^. A realization of N random 
observations Zj, is generated by randomly selecting N points ft, i = 1,..., N, in the interval [0,10] 
(with uniform distribution) and adding a Gaussian noise with zero mean and variance to the 
value x(t*). That is, 

Zi = x(t*) -h crci, Ci ~ iV(0,1), i = 1,..., V, 

with Y(0,1) denoting the Gaussian normal distribution. A number of 1000 of such realizations were 
generated for different values of a and N. These 1000 realizations were arranged into 20 batches 
of 50 realizations each, where each batch corresponds to a fix distribution of the observation time 
points ft, i = 1,...,Y. The distribution of the observation time points then varies from batch 
to batch. The goal was to estimate the parameters p, xq and a in each realization. For each 
realization, the initial parameters guesses were set at p^^^ = (9,1,2) and = 1, and m = 50 
shooting nodes were distributed over the interval [0,10]. 
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It should be noticed that the integration of this chaotic system with initial condition xq = 
(0,0,0.3,—0.4) and parameter p = (9,1,2) leads to numerically unstable solutions (i.e. numerical 
explosions) after t = 4.4 even with a very small fixed step size oi h = 2“^^. This evidences 
that the classical initial value approach estimation is not suitable in this scenario. Instead, more 
sophisticated methods like the multiple shooting approach presented here seems to be a proper 
choice. 

The estimated parameters are reported in Table 1 as the average within the batch (i.e. average 
across 100 realization of fixed observation time points distribution) and then average and standard 
deviation across the 20 batches. Notice that such a summary should not be confounded with the 
so-called a posteriori analysis (see m) that is usually carried out for statistical inference of the 
estimated parameters (e.g. variance-covariance matrix and confidence interval for the estimated 
parameters). 



a = 

0.05 

a = 0.1 


N 

100 

200 

100 

200 

a 

1.0002 ± 0.0006 

0.9998 ± 0.0007 

1.0008 ± 0.0021 

1.0003 ±0.0014 

b 

0.9986 ±0.0017 

1.0003 ± 0.0020 

0.9973 ± 0.0045 

0.9997 ±0.0026 

c 

-0.9993 ± 0.0042 

-0.9991 ± 0.0025 

-1.0018 ±0.0068 

-0.9989 ±0.0061 


-0.0001 ± 0.0014 

-0.0004 ±0.0011 

0.2995 ± 0.0006 

-0.4005 ± 0.0008 

0.0001 ± 0.0009 

0.0001 ± 0.0010 

0.2997 ±0.0005 

-0.3999 ±0.0005 

-0.0016 ±0.0031 

-0.0003 ±0.0022 

0.2990 ±0.0015 

-0.4003 ±0.0018 

0.0006 ±0.0019 

-0.0001 ±0.0013 

0.2998 ± 0.0008 

-0.4003 ± 0.0015 

a 

0.0496 ± 0.0001 

0.0501 ±0.0001 

0.0992 ± 0.0001 

0.0997 ±0.0001 

N.Iter. 

6.4320 ± 1.0058 

5.0220 ±0.4527 

7.5070 ± 0.7993 

6.8910 ±0.3061 


Table 1. Estimated parameters and number of required Gauss-Newton iterations {N.Iter.) 
corresponding to the Henon-Heiles system. 


Figure 1 shows the true trajectory with initial condition xq = (0,0,0.3, —0.4) and N = 100 noisy 
observations corresponding to one realization with a = 0.1. This figure also shows the approximated 
discontinuous trajectory after the first iteration as well the estimated optimal trajectory after 
I = 6 iterations of the Gauss-Newton method. This optimal trajectory corresponds to the estimated 
parameters xq = (—0.0231,-0.0008,0.3055,-0.3899), p = (1.0314,0.9839,-1.0101) and a = 
0.1029. Notice that the hrst iteration produces a discontinuous trajectory due to the continuity 
conditions Q are unable to be satisfied at this stage of the optimization process. However, after 
only four iterations, the estimated parameters and shooting nodes produce an optimal continuous 
trajectory that is quite close to the true trajectory of the problem. 

Example 2. Consider the FitzHugh-Nagumo ODE, which is a simplihed version of the well- 
known Hodgkin-Huxley model for describing activation and deactivation dynamics of a spiking 
neuron: 

i/3 

^ = c{V- — + R) 

R = -^{V -a + bR), 
c 

where V and R denote the voltage across an axon membrane and the outwards currents, respectively. 
Here, a, b, c are parameters to be estimated from n = 400 noisy observations of the variable V, 
which were randomly distributed (with uniform distribution) in the interval [0,20]. Similarly to 
[l2] and [ 8 ], the true trajectory was generated with initial values H( 0 ) = —1 and i?( 0 ) = 1 and 
true parameters a = 0.2, b = 0.2 and c = 3. The noisy observations were generated by adding 
a Gaussian noise with standard deviation a = 0.2. The initial parameter guess was set p^^^ = 
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Figure 1: Initial and optimal trajectory corresponding to the Henon-Heiles system with 
estimated initial condition xq = (—0.0231,—0.0008,0.3055,—0.3899) and parameters p = 
(1.0314, 0.9839, -1.0101) and a = 0.1029. 

) = (2,2,5) and = 1. A number of m = 50 shooting nodes were approximately 
equispaced over the set of observation time points. Since only the variable V is observed in the 
case and no additional information if available for the variable R at the shooting points, we set the 
second component of equal to zero for all /c = 0 , ...,m. 

The estimated parameters resulting from 1000 realizations (20 batches of 50 realizations each) 
were a = 0.2007±0.0023, b = 0.1932±0.0068, c = 2.9794±0.0113, xq = (-1.0019±0.0123,1.0085± 
0.0121) and a = 0.2016 ± 0.0010. Figure 2 shows the true, initial and estimated trajectories after 
I = 29 iterations. Notice that a larger number of iterations were required in this case probably 
caused by the very bad (far away from the true trajectory) initial guess of the second component in 
the shooting nodes. The estimated trajectory corresponds to parameters with values a = 0.1971, 
6 = 0.2210 and c = 2.9716. 

Example 3. Consider the Rikitake model defined by the ODE 

Xl = -/UXl + X2X3 

X2 = —axi — HX2 + X1X3 

X3 = I-X1X2, 

which was originally introduced by [35] to explain geomagnetic polarity reversals. The model 
consists of coupled, self-excited disc dynamos, where the parameter // > 0 and a represent the 
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Figure 2: Initial and optimal trajectory corresponding to the FitzHugh-Nagumo system with esti¬ 
mated parameters a = 0.1971, b = 0.2210 and c = 2.9716. 

resistive dissipation and the difference of the angular velocities of two dynamo discs, respectively. 
Despite the physical meaning of ^ is still not clear, estimates of geophysically plausible value for 
/i vary between 10“^ and 10 [36]. Most of the studies for explaining the dynamical behavior of 
the Rikitake system focus on the parameter space determined by the pairs (/i, K) , where a. = 
h{K‘^ — K~‘^) (see |36| for instance). Thus, combinations of the pairs produce different 

dynamical regimes, like the chaotic regime determined by /r = 0.5 and a = 0.46125 {K = 1.25). 

For this example, the parameters /i and a are going to be estimated from N = 200 noisy ob¬ 
servations of the three variable, randomly distributed (with uniform distribution) in the interval 
[0,40]. A ’’true” trajectory was simulated with initial value xq = (—2,—2,0) and the noisy obser¬ 
vations were generated by adding a Gaussian noise with standard deviation a = 0.1. The initial 
parameter guesses were set at = (5, 5) and = 1. The following table presents 

the estimated parameters for different numbers of shooting nodes, including the case m = 0 cor¬ 
responding to the Initial Value approach. The estimated parameters are reported by the average 
and standard deviation over 100 different realizations of the observations Zj with a fix (random) 
distribution of the N observation time points i = 1, Notice that the average and standard 

deviation were calculated only across those realizations where the estimation algorithm converged 
after a maximum number of 50 iterations. In fact, this table also shows the required number of 
Gauss-Newton iterations (N.Iter.) that the algorithm needed to converge as well as the percentage 


14 











of convergence {%Conv.). 


m 

60 

40 

30 

20 

V- 

0.5001 ± 0.0005 

0.5005 ±0.0137 

0.4998 ±0.0199 

0.4965 ± 0.0359 

a 

0.4613 ±0.0010 

0.4623 ±0.0194 

0.4581 ± 0.0248 

0.4351 ±0.0719 

xo 

-1.9992 ±0.0269 

-1.9993 ±0.0204 

0.0002 ± 0.0490 

-2.0121 ±0.1066 

-2.0551 ±0.1654 

0.0074 ±0.0927 

-2.0183 ±0.1583 

-2.0297 ±0.1927 

0.0311 ±0.1913 

-2.1491 ±0.3754 
-2.1228 ±0.4045 

- 0.4141 ±0.5947 

a 

0.0999 ± 0.0033 

0.1324 ±0.1926 

0.1798 ±0.3397 

0.3764 ± 0.6537 

N.Iter. 

15.51 ± 1.13 

18.73 ±3.94 

19.07 ±5.72 

27.35 ± 7.26 

%Conv. 

99 

88 

77 

17 


Table 2. Estimated parameters, number of required Gauss-Newton iterations (N.Iter.), 
and percentage of convergence (%Conv.) corresponding to the Rikitake system. 


Notice that as the number of shooting nodes decreases, the estimated parameters become less 
accurate and the number of non convergent realizations increases. In fact, the simulations cor¬ 
responding to m = 10 and m = 0 showed no convergent realization at all, which evidences the 
efficacy of the multiple shooting method as compared to the Initial Value approach. Importantly, 
recall that, due to the equivalent condensed problem, increasing the number of shooting nodes 
does not increase the dimensionality of the optimization problem. Therefore, as a rule of thumb, 
it is recommendable to employ the multiple shooting approach with a relatively large number of 
shooting nodes, particularly for those system showing complex, chaotic dynamics. 

5 Discussion 

The methodology presented here can be extended in several ways. As it was mentioned earlier, 
only the case of equality constrains for parameters and state variables has been treated here. For 
inequality constrains, it is easy to check that the condensing recursion is exactly the same as for 
equality constrains. Therefore, the solution of the condensed problem must be obtained by more 
general optimization strategies like active set strategy (see details in [6] and [7]). Additionally, 
we have assumed a very simple assumption for the measurements errors that define the set of 
observed data points. Namely, uncorrelated and equally distributed errors have been assumed for 
the components of the multi-dimensional data. This scenario can be easily extended to the more 
general case of correlated errors by replacing the parameter by a variance-covariance matrix 
S and defining a proper formulation of the function Fi(p). Correspondingly, the observed data 
and the measurements errors might define more complicated statistical models like mixed effects 
models to cover, for instance, the cases of repeated measures at certain time points and temporarily- 
correlated errors. 

Finally, the multiple shooting-LL approach can covers a more general class of models driven by 
random differential equations (RDE). Essentially, a RDE is a non autonomous ODE coupled with 
a stochastic process, which is usually employed for modelling noisy perturbations of deterministic 
systems. Thus, in principle, a RDE can be integrated by applying conventional numerical methods 
for ODEs, like the LL integrator presented here m- In fact, the LL method for RDE has been 
already successfully applied for the generation of EEG rhythms by means of realistically coupled 
neural mass models |38j . A possible extension consists of having more realistic neural mass models 
with certain free parameters that could be estimated from observed EEG data via the multiple 
shooting approach. 
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6 Conclusions 


In this paper we have shown the feasibility of the multiple shooting approach in combination with 
local linearization techniques for parameter estimation in ordinary differential equations. The main 
advantage of the proposed approach consists of approximating the numerical derivatives involved 
in the multiple shooting scheme by a numerically stable method at no extra computational burden 
but the one required for the numerical integration of the original equations. The performance of 
the proposed approach has been evaluated in three different numerical examples. In all cases, the 
multiple shooting-local linearization method accurately recovered the true parameters values. 
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